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Abstract 

This analysis is based upon enrolment and completion data collected for a total of 221 Massive 
Open Online Courses (MOOCs). It extends previously reported work (Jordan, 2014) with an 
expanded dataset; the original work is extended to include a multiple regression analysis of 
factors that affect completion rates and analysis of attrition rates during courses. Completion 
rates (defined as the percentage of enrolled students who completed the course) vary from 0.7% 
to 52.1%, with a median value of t2.6%. Since their inception, enrolments on MOOCs have fallen 
while completion rates have increased. Completion rates vary significantly according to course 
length (longer courses having lower completion rates), start date (more recent courses having 
higher percentage completion) and assessment type (courses using auto grading only having 
higher completion rates). For a sub-sample of courses where rates of active use and assessment 
submission across the course are available, the first and second weeks appear to be critical in 
achieving student engagement, after which the proportion of active students and those submitting 
assessments levels out, with less than 3% difference between them. 

Keywords: Distance education; open learning; online learning; massive open online courses 
(MOOCs) 
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Introduction 

Since Massive Open Online Courses (MOOCs) became mainstream in 2012, completion rates 
have been a controversial topic. While six-figure enrolment figures have garnered intense media 
attention, critics have highlighted that few students complete courses relative to more formal 
modes of learning. Counter to this, others have argued that the emancipatory effect of free online 
access to education allows students to take what they need from MOOCs to meet their own 
learning goals without formally completing courses; to examine completion rates is potentially 
misleading (LeBar, 2014). This study takes a perspective which acknowledges that while 
completing courses is not the only way of benefitting from participation in a MOOC, it is better to 
try to understand the factors that affect completion rates and any implications for course design 
than to ignore them. 

In the literature on MOOCs there is a lack of peer-reviewed research publications which draw 
upon more than a small number of courses restricted to single institutions, and the need for meta¬ 
analysis independent of MOOC platform providers is a key issue for the field at present. For 
example, Kizilcec, Piech and Schneider (2013) identified learner populations based on analysis of 
three early Coursera MOOCs; replication of this analytical approach on data from the Futurelearn 
platform identified different groups (Ferguson & Clow, 2015). Enrolment and completion figures 
are the type of data that is most widely publicly available for analysis across the field, which is 
necessary to ensure that conclusions are generalisable and not particular to a small number of 
detailed cases. 

Understanding the factors which affect completion rate can be approached from the perspective 
of characteristics of learners and their reasons for participating, or improving the design of 
courses. Greater focus to date has been on the motivations and behaviours of students in relation 
to success (for example, Breslow et al., 2013; Kizilcec et al., 2013; Roller, Ng, Do & Chen, 2013; 
Rose et ah, 2014). However, studies have suggested that those most likely to succeed in MOOCs 
are the students who are already most educationally privileged (Emanuel, 2013; Roller and Ng, 
2013). Addressing MOOC completion rates from a course design perspective, in order to help as 
many diverse learners to complete as would wish to, is a pedagogical issue and a challenge for 
course designers and instructors. In order for MOOCs to realise their potential in making 
education open to all, it is not sufficient to simply make pre-existing course materials freely 
available online. Completion rates are relatively low even among students who intend to complete 
the course (an average of 22%; Reich, 2014) so for those students who intend to complete courses 
or engage with the course as designed, not considering completion rates prevents exploration of 
what can be done by educators to facilitate further student success. 

This paper reports work undertaken to extend a previous study on initial trends in MOOC 
completion rate (Jordan, 2014), which remains one of the largest MOOC studies in terms of 
number of courses included. The dataset is expanded (to include more recent data, and not 
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restricted to the major MOOC providers) and multiple regression analysis is used to explore the 
combined effects of a range of basic factors in correlation to completion rate. Additionally, 
quantitative measures of engagement across the course of a MOOC are examined in instances 
where this data is available. 


Data Collection and Analysis 

The approach to data collection built upon an initial dataset (reported in Jordan, 2014) which 
combined enrolment and completion figures from news stories, MOOCs the author had taken as a 
participant, and crowdsourced figures submitted by other students and MOOC instructors via a 
blog. In order to expand the dataset, the author compiled a list of completed MOOCs not present 
in the original dataset (based on in formation from MOOC-aggregating websites such as 
https://www.class-central.com/) and performed a series of Internet searches to find sources 
containing enrolment and completion information for the courses. Information was found and 
included relating to a total of 221 MOOCs. Information about 35.3% of the courses was located in 
news articles; 33.6% were sourced from academic reports and articles; 14.5% from instructors’ 
social media; 9.9% from students’ social media; and 6.8% from course sites. Further information 
about courses, including course length and assessment type, was gathered via signing up to the 
courses, asking participants or consulting MOOC-aggregating websites. Information about 
university reputation was included based on the scores used by the Times Higher Education 
World University Rankings (Times Higher Education, 2013). No further courses were added to 
the dataset after November 2013. To access the full dataset, including links to each source and 
further recent additions, see the online data visualization (Jordan, 2015). The dataset can be 
summarised as follows: 

• A total of 221 courses were included in the dataset. Within this sample, enrolment figures 
were available for 220 courses; completion figures were available for 129 courses; and 
figures relating to engagement over the course of a MOOC were available for 59 courses. 
The two most common definitions of engagement across the duration of courses used by 
the sources were the number of students accessing resources, or completing assignments. 

• Courses from a range of different MOOC providers were included. Coursera (120 courses) 
and 0pen2Study (43 courses) were the best represented platforms, although courses from 
12 other providers and 19 independent courses were also included. A total of 78 
institutions were present in the dataset. 

• A variety of different definitions of completion are in operation. Of the 129 courses for 
which completion data was available, the most prevalent definition of completion was 
earning a certificate (93 courses). Other definitions (as used by their data sources) 
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included ‘completed course’ (14 courses), ‘passed course’ (10 courses), ‘completed 
assignments’ (6 courses), ‘memorably active participants’, achieving a ‘strong final score’, 
active contributors at end of course, certificates purchased, ‘kept up’ with whole course, 
or ‘took final exam’ (1 course each). 

• In addition to enrolment and completion figures, other data collected about courses 
included start date, length of course (in weeks), and assessment type used. Assessment 
type was categorised using three basic categories: auto grading only (92 courses), peer 
grading only (10 courses), or a combination of both auto and peer grading (23 courses). 

Three types of analysis were applied to the dataset. Since the expanded dataset encompassed a 
wider time period, linear regression analysis was used to gain an overview of whether MOOC 
enrolments and completion rates were changing over time. 

Multiple regression analysis was then used with a sub-sample of courses to explore whether 
MOOC completion rates are significantly correlated with a range of factors. Multiple regression is 
“an extension of simple regression in which an outcome is predicted by two or more predictor 
variables” (Field, 2009, p. 790). As an analytical approach, multiple regression offers the 
advantage of being able to examine the relationship between multiple variables upon an outcome 
(in this case, completion rate). The factors examined included assessment type, course length, 
date, and university ranking score. While course length, date and university ranking had 
previously been examined individually (Jordan, 2014), the larger dataset offered the opportunity 
to consider the factors together. Platform and MOOC type were not included due to wide variation 
in sample sizes. Note that full information about all of the factors was not available for every 
course in the sample, so not all of the courses in the dataset were used in the regression analysis 
(see results and discussion). The statistical analyses were undertaken using SPSS (Field, 2009). 

The third type of analysis focused upon a smaller sample of courses for which data was available 
about number of students participating week-by-week across the course of live MOOCs. In some 
cases, raw data was not available and required extraction from charts using software (Rohatgi, 
2014). This extends the original finding that approximately fifty percent of potential MOOC 
students who sign up go on to become active users (Jordan, 2014) and examine how this trend 
progresses. Participation is defined in two ways by data sources: either the number of students 
viewing course materials, or the number submitting assignments. To allow comparison, these 
values were expressed as a percentage of the total enrolment for each course. The resulting curves 
were compared visually and average curves constructed using mean values. 
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Results 


The first analyses focused upon whether enrolments and completion rates appeared to be 
changing over time. Regression analysis was performed to examine the extent to which course 
start date predicts the number of students enrolled (figure 1), and the extent to which course start 
date predicts the percentage of students that complete the course (figure 2). Prior to both 
analyses, a Box-Cox transformation was applied as the residuals were not normally distributed. 
Date significantly predicted total enrolment figures by the following formula: Enrolled A o.180902 
= 66.3311 - 0.00147092 Date (n=2i9 , R2=o.0252 , p=o.oi9). Note that the correlation here is 
negative; as time has progressed, the size of the average MOOC has decreased. Date also 
significantly predicted completion rate by the following formula: PercentCompleted A o.5 = - 
152.428 + 0.00377601 Date (n=i29 , R2=o.i457 , p<o.ooi). In contrast, this represented a 
positive correlation, so while total enrolments have decreased, completion rates have increased 
over time. 



Figure 1. Scatterplot of MOOC course enrolments plotted against course start date (n=2i9). 
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Figure 2. Scatterplot of percentage of students who completed courses plotted against course start 
date (n=i29). 


Multiple Regression Analysis 

A multiple regression analysis was carried out in order to examine the combined effects of the 
factors upon completion rate, which had previously been examined individually (Jordan, 2014). 
An initial run of the analysis included the following factors: University ranking score, start date, 
course length (in weeks), total number of students enrolled, and assessment type. Assessment 
type was a categorical variable comprising three categories (‘Auto grading only’, ‘peer grading 
only’, and ‘auto and peer grading’). The start date variable was defined as the date that each 
course formally launched; for the purposes of the analysis, this information was converted to the 
Lilian date format. The result of this initial analysis showed that while the model explained a 
significant amount of the variance in completion rate (F(6, 53) = 13.98, p < .05, R 2 = .61, R 2 Adjusted 
= .57), not all of the factors significantly predicted completion rate. In light of this, university 
ranking and total enrolment were excluded from the analysis, and assessment types recoded into 
two categories based on whether or not peer grading was used at all. 

The analysis proceeded to examine the extent to which start date, course length and use of peer 
grading predicted completion rate. An analysis of standard residuals was carried out, which 
This work is licensed under a Creative Commons Attribution 4.0 International License. 


346 




Massive Open Online Course Completion Rates Revisited: Assessment, Length And Attrition 

Jordan 


showed that the data contained no outliers (Std. Residual Min = -2.56, Std. Residual Max = 2.07). 
Tests to see if the data met the assumption of collinearity indicated that multicollinearity was not 
a concern (course length, Tolerance = .96, VIF = 1.40; start date, Tolerance = .99, VIF = 1.01; use 
of peer grading, Tolerance = .96, VIF = 1.05). The data met the assumption of independent errors 
(Durbin-Watson value = 1.08). The histogram of standardised residuals indicated that the data 
contained approximately normally distributed errors, as did the normal P-P plot of standardised 
residuals. The scatterplot of standardised predicted values showed that the data met the 
assumption of linearity but may have violated the assumption of homoscedasticity. Note that 
violating this assumption does not invalidate the analysis, which is accurate based upon the 
sample used, but reduces the likelihood that the model generalises to the population (this would 
not be certain, but more likely, if all the assumptions were met; Field, 2009). The 
heteroscedasticity observed is not severe and is most likely caused by the presence of significant 
variables that are not included in the model. Given the opportunistic nature of the data collection 
it would be surprising if all significant variables had been identified. However, this does not 
detract from the significant factors that have been determined. The data also met the assumption 
of non-zero variances (course length, Variance = 13.58; start date, Variance = 15480621.9; use of 
peer grading, Variance = .195). 

Using the enter method it was found that course length, start date and use of peer grading explain 
a significant amount of the variance in the completion rate (F(3, 117) = 57.7, p < .05, R 2 = .60, 
R 2 Adjusted = .59). The model summary, ANOVA table and coefficients table from the analysis are 
shown in tables 1, 2 and 3, respectively. 

Table 1 

Model Summary 


Model Summary 15 


Model 

R 

R 

Square 

Adjusted R Square 

Std. Error of the 
Estimate 

Durbin-Watson 

1 

•773 

a 

•597 

•587 

6.905697235734618 

1.084 


a. Predictors: (Constant), PeerGrading, DatelSO, Length 


b. Dependent Variable: PercentCompleted 
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Table 2 
ANOVA Table 


ANOVA a 


Model 

Sum of Squares 

df 

Mean 

Square 

F 

Sig. 

Regression 

8260.829 

3 

2753.610 

57-741 

•OOO b 

1 Residual 

5579-573 

117 

47-689 



Total 

13840.402 

120 





a. Dependent Variable: PercentCompleted 

b. Predictors: (Constant), PeerGrading, DatelSO, Length 


Table 3 

Coefficients Table 


Coefficients 3 


Model 

Unstandardised 

Coefficients 

Standardised 

Coefficients 

t 

Sig. 

Collinearity 

Statistics 


B 

Std. 

Error 

Beta 



Toleran 

ce 

VIF 

(Constant) 

00 

CO 

u? O 

5.029 


11.4 

09 

.000 



DatelSO 

-.001 

.000 

-•343 

5.81 

6 

.000 

•991 

1.009 

1 

Length 

-1-751 

.209 

-•503 

8-39 

1 

.000 

•958 

1.043 

PeerGradi 

ng 

9.606 

1.472 

-•392 

6.52 

7 

.000 

•955 

1.047 


a. Dependent Variable: PercentCompleted 
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The correlation between start date and completion rate was positive, in that completion rates 
increased over time. The other factors - course length and whether or not peer grading was used 
- were both negative correlations (Figures 3 and 4), with longer courses and those which use peer 
grading having lower completion rates than shorter or auto-graded courses. 



Figure 3. Completion rate plotted against course length in weeks. 
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Figure 4. Boxplots showing distributions of completion rates according to different assessment 
types. 


This categorization is not elaborate and, with an R 2 of 60%, a reasonable proportion of the 
variance remains unexplained, It does however demonstrate that it is possible to gain insights 
into the impact of learning design decisions by considering completion rate. Availability of data is 
an obstacle to further detailed analyses (such as considering the use of exams, different forms of 
auto and peer grading, formative or summative assessment) at this stage. 

Attrition Rates During Live Courses 

The previous study considered the conversion rate between students who enrol and then go on to 
become active in courses, by accessing course materials or logging in to the course site (Jordan, 
2014). This part of the analysis sought to extend this by considering the levels of use week-by¬ 
week during live courses, for MOOCs where this level of detail is available. The data collected by 
courses only focused upon levels of use during the periods which the course was active. 
Participation was defined in two ways; either by the number accessing course materials (‘active 
students’), or the number who submitted assignments. Data about active students was available 
for 59 courses (figure 5), and those submitting assignments in 54 courses (figure 6). The sample 
included data from a range of courses and platforms, including 43 0pen2Study courses 
(0pen2Study, 2013), 17 Coursera courses (Belanger, 2013; Duke University, 2012; Grainger, 
2013; University of Edinburgh, 2013; Severance, 2013); one edX course (Breslow et al., 2013); 
and two platform-independent courses (Cross, 2013; Weller, 2013). Note that the majority of 
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courses included in figures 5, 6 and 7 were four weeks long due to the open availability of data 
from the 0pen2Study platform, upon which all courses are four weeks in duration. 



Figure 5. Proportion of active students (accessing course materials) per week since start of course 
as a percentage of total enrolment (n=59). 
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Weeks from start of course 


Figure 6. Proportion of students submitting assessments per week since start of courses as a 
percentage of total enrolment (n=54). 


The curves shown in Figures 5 and 6 are notable in two main ways. First, while the sample 
contains variety of different types of MOOCs (different platforms, institutions, and modes of 
teaching and assessment are present), the curves follow a similar overall trend. Around half of a 
MOOCs’ enrolled students will not show up, and the first two weeks of a course appear to be 
critical in gaining student engagement. Second, after the first two weeks, there is little difference 
between the two measures of engagement - those accessing course materials and those 
submitting assignments. The difference between the two measures is shown in figure 7. After 
week 3, the difference is less than five percent for all but two courses. This calls into question the 
extent to which ‘lurking’ (that is, selectively accessing course materials but not actively 
participating in assessments) is being used as a participation strategy, underlining the need for 
further research in this area. 
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Figure 7. Difference between percentage of active students and percentage of students submitting 
assignments per week since the start of courses as a percentage of total enrolment (n=5o). 


Note that since this work was undertaken, a similar study has been published based on rich, 
detailed data from 16 Coursera-based MOOCs provided by the University of Pennsylvania (Perna, 
Ruby, Boruch, Wang, Scull, Ahmad & Evans, 2014). This study corroborates the findings here in 
that the initial weeks of courses are key for students engagement. Over the course of the MOOCs 
in the Perna et al. (2014) sample, broadly similar attrition curves are demonstrated, and the gap 
between students accessing materials and taking assignments narrows over time. 


Conclusions 

The multiple regression analysis highlights that it may be possible to gain insights into the 
impacts of different aspects of MOOC course design by considering completion rate across a large 
sample of courses. The results here may be useful for educators to consider when designing 
MOOCs, although there are limitations to this study and further research would be valuable. 
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Factors that significantly predicted completion rate included start date, course length and 
assessment type. Completion rates were positively correlated with start date; that is, more recent 
courses demonstrate higher percentage completion. This is likely due to a decrease in average 
total enrolments over time, but may also reflect feedback and iterative design of courses. 

On the basis of the negative correlation with course length, coupled with the attrition observed in 
the initial weeks of courses, a case could be made for shorter, more modular courses. Greater 
signposting would be required between courses for those students looking to create a more 
substantial programme of learning. Shorter courses with better guidance about how they could be 
combined could also benefit those students who prefer to direct their own learning by making it 
easier to find the parts of a course that they value; this would also allow for these students’ MOOC 
achievements to be recognised. Modularisation for MOOCs has already been suggested by some 
(for example, Bol cited in Harvard Magazine, 2013; Challen & Seltzer, 2014); the evidence here 
provides an empirical rationale for such developments, and further research would be valuable to 
examine the effects in practice. Note that in contrast to this finding Perna et al. (2014) reported 
no relationship between course length and completion rate. Given the similarity between the 
attrition curves reported by Perna et al. and those presented here, it is likely that the lack of a 
negative correlation is due to small sample size (16 courses from a single institution and platform, 
several of which are included in the dataset here). 

The negative correlation between use of peer grading for assessments and completion rate 
suggests that course designers should carefully consider whether to use this as an assessment 
mechanism, or whether automated assessments would meet their educational goals. For example, 
in the case of peer grading short essays using a rubric based on factual recall, similar results could 
be achieved using multiple choice questions. In contrast, larger, project-based peer graded 
assessments may yield more significant learning gains for the students who do complete them as 
they are arguably more demanding and there is an aspect of vicarious learning from assessing 
others, though the quality of this also requires empirical verification. Further research into the 
use of peer grading would be valuable to investigate the reasons behind this finding. For example, 
it could be hypothesised that the lower completion rate in courses using peer grading may be due 
to peer assessments being more rigorous, disengagement due to having to wait for feedback, or 
reasons why students may choose not to attempt them (such as proficiency in English, for 
example). Another possible factor influencing increased attrition in peer graded MOOCs may be 
disengagement from students from minority cultural backgrounds as MOOC students assessing 
their peers have been shown to give higher marks to students from their own country (Kulkarni, 
Koh, Le, Papadopoulos, Cheng, Roller & Klemmer, 2013). A better understanding of these issues 
is needed to clarify the circumstances in which peer grading could be recommended. 

While this study provides some insights into the potential impact of learning design decisions 
upon completion rates, it does have its limitations and further empirical work is required. The 
principal limitation of this work is the availability of data from courses. This sample only reflects 
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courses for which data is publicly available; a more detailed picture would be possible if further 
data was made available. The definition of completion rate as a percentage of enrolled students 
may be over-simplistic and subject to wide variations in enrolments (Ho et al., 2014). More 
nuanced definitions have been called for to reflect the numerous ways students may interact with 
MOOCs (DeBoer et al., 2014). However, the fact remains that total enrolments and certificate¬ 
earning completers are the statistics most frequently present in the public domain. As the body of 
academic literature related to MOOCs grows, the potential for more detailed and robust meta¬ 
analysis is likely to increase in the future. 
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